Haplotype Structure

نویسندگان

  • Yu Zhang
  • Tianhua Niu
چکیده

This chapter consists of five parts. In the first part, we provide definitions for important terms and concepts used in studies of population haplotype structures. In the second part, we introduce the user to valuable publicly available genotype/haplotype databases, such as databases generated by the International HapMap Project. In the third part, we provide concise guides to the user on how to download genotype data from the HapMap web site, how to use the Haploview program, as well as how to perform haplotype simulation. In the fourth part, we provide guides to several widely used haplotype inference Inference methods, including the Clark’s algorithm, PHASE, HAPLOTYPER, and CHB. In the fifth part, we present to the user two popular software packages, LDhat and HOTSPOTTER, for estimation of recombination rates. 1 Population Haplotype Structure 1.1 Haplotype Block Structure in Human Populations Based on empirical studies, the human genome can be viewed as a series of high linkage disequilibrium (LD) regions separated by discrete segments of very low LD [28, 30, 41]. Those genetic markers located within a high LD region are inherited from generation to generation essentially as a single unit. For example, Daly et al. [28] found that, a 500-kb region covering 103 single nucleotide polymorphisms (SNPS) on chromosome 5q31 could be partitioned into 11 haplotype blocks (99 SNPS were within these blocks, and four SNPS were outside the blocks). They found that within each block, two to four haplotypes account for at least 90% of haplotype variations in their sample [28]. In another study of SNPs located in a T. Niu (B) Department of Psychiatry and Neurobehavioral Sciences, University of Virginia, 1670 Discovery Drive, Suite 110, Charlottesville, VA 22911, e-mail: [email protected] S. Lin and H. Zhao (eds.), Handbook on Analyzing Human Genetic Data, c 25 DOI 10.1007/978-3-540-69264-5 2, © Springer-Verlag Berlin Heidelberg 2010 26 Y. Zhang and T. Niu 216-kb region of the major histocompatibility complex (MHC) II complex in 50 British male sperm samples, Jeffreys et al. [34] revealed that recombination hotspots had caused block-like LD structures. These results lead to the conceptualization of haplotype blocks. Haplotype blocks are defined as long stretches of DNA along a chromosome that have low recombination rates, which exhibit high LD and are characterized by relatively few haplotypes [39]. Furthermore, adjacent blocks are presumably separated by recombination hotspots, which are short regions with high recombination rates. Recombination hotspots (or coldspots) are defined as regions of the human genome with higher (or lower) recombination fractions than would be expected on the basis of the genome average recombination rate, 1 cM/Mb [27]. However, it should be noted that recombination hotspots (or coldspots) can also be defined relative to their local recombination rates. DNA segments that undergo more (or less) recombinations than their surrounding regions can also be defined as recombination hotspots (or coldspots). It should also be noted that the term recombination hotspots (or coldspots) can correspond to chromosomal segments that vary considerably in size. Popular software packages for identifying haplotype blocks include: HapBlock [47], HaploBlock [32], and HaploBlockFinder [46]. Although the presence of recombination hotspots can result in discrete haplotype blocks [28,30,31,43], a notion which appears to be supported by sperm typing studies of class II region of MHC [34], coalescent simulations demonstrate that a model assuming randomly distributed recombinations can also explain haplotype block-like structures [45]. Furthermore, by using the four-gamete test (FGT [14]) for defining haplotype blocks, Wang et al. [45] showed that the empirical chromosome 21 SNP dataset [41] is also congruent with a randomly distributed recombination model (i.e., without hotspots) with a varying recombination rate across the chromosome. Fig. 1 shows a schematic diagram for an idealized haplotype block structure. This structure implies that a disease-causing mutation is often introduced on a specific haplotype background. Delineation of the haplotype block structure would help selecting a minimal set of haplotype-tagging SNPs (htSNPs) in searching for disease-causing mutations [35]. The selection of htSNPs not only ensures that the majority of haplotypic variations are captured but also dramatically reduces the genotyping cost in comparison with an exhaustive SNP coverage approach. It should be cautioned that recombination hotspot intensities vary such that haplotype block boundaries are often not sharp, and typically each hotspot corresponds to a genomic region of 1–2 kb in length [34]. 1.2 Wright–Fisher Model In the 1930s, both Ronald A. Fisher [18] and Sewall Wright [19] developed a stochastic model that allows a mathematical description of population reproduction. This model has become known as the Wright–Fisher model and is widely used Haplotype Structure 27 32% 28% 19% 16% 5% I II III IV V Haplotype ID Frequency 21 SNPs 10 SNPs 6 SNPs 5 SNPs Hotspot A Hotspot B Block 1 Block 2 Block 3 Fig. 1 Recombination disrupts the configurations of ancestral haplotypes when they are passed on from generation to generation. Each square represents a specificallele (white: wild-type allele; gray: variant allele) at a pahular SNP position. The entire haplotype encompasses 21 SNPs. The presence of two recombination hotspots (A and B) results in a block-wise structure of this region, which is composed of three discrete blocks, Blocks 1 (5 SNPs), 2 (10 SNPs), and 3 (6 SNPs), respectively. Recombination hotspots A and B reshuffle respective sub-haplotypes across the three blocks to create the overall block-like haplotype structure in population genetic studies. The Wright–Fisher model is the canonical model of genetic drift in populations, which has the following assumptions: 1. Constant diploid population of size N (2N alleles) 2. Synchronized and nonoverlapping generations 3. Random mating 4. No recombination 5. No selection 6. No migration to or from other populations and 7. Mutations are neutral and occur at a constant rate μ per generation A schematic illustration of the Wright–Fisher model and the genealogical tree of two gene copies of the present generation are shown in Fig. 2. The Wright–Fisher model is a simple binomial model of the amount of genetic randomness in a population of alleles created due to sampling. Assuming a haploid population size 2N , the distribution of alleles can be described by a Markov model with binomial transition probabilities (for bi-allelic SNPs). More specifically, let Xi denote the number of a particular allele in the ith generation, then the distribution of the allele number in the next generation, Xi+1, can be expressed as P (Xi+1 = b|Xi = a) = ( 2N

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Haplotype Block Partitioning and tagSNP Selection under the Perfect Phylogeny Model

Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome.Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Theparticular pattern of these common variations forms a block-like structure on human genome. In this work,we develop a new method based on the Perfect Phylogeny Model to identify haplo...

متن کامل

Population structure of sea cucumber Holothuria parva by 16S rRNA mitochondrial in the costs of Bushehrand Halileh from Persian Gulf

Population structure of sea cucumber Holothuria parva in the coasts of Bushehr and Halileh from Persian Gulf was determined by 16S rRNA of mitochondrial genome sequencing in autumn and winter seasons of 2019. In Bushehr and Halileh populations, 2 and 4 haplotypes were identified out of 374 nucleotide sites, respectively, and haplotype 2 was the most abundant in Bushehr population and was observ...

متن کامل

Investigation of GDF9 and BMP15 Polymorphisms in Mehraban Sheep to Find the Missenses as Impact on Protein

Utilization of fecundity genes such as GDF9 and BMP15 can help improve reproductive traits in sheep breeding programme. To evaluate effects of missense mutations on protein function, the polymorphisms of GDF9 and BMP15 genes were screened in twelve mehraban sheep using DNA sequencing, followed by protein structure modeling. Six single nucleotide polymorphism (SNPs) known as FecG mutations (G1-G...

متن کامل

Genetic relationships among collections of the Persian sturgeon, Acipenser percicus, in the south Caspian Sea detected by mitochondrial DNA Restriction fragment length polymorphisms

In the present study, mitochondrial DNA polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) assay was used to assess the population structure and genetic relationships among six Persian sturgeon, Acipenser persicus populations from south Caspian Sea along the Iranian coast. The complete nucleotide dehydrogenase subunit 5 (NADH 5) region of mtDNA amplified by PCR was di...

متن کامل

P-236: Haplotype Analysis of The H2B.W Gene in Severe Oligospermic and Azoospermic Infertile Men Referred to Royan Institute

Background: Recent studies demonstrated the multifactorial and chronic nature of male infertility, including mutations of some known spermatogenesis-related genes. The H2B family, member W (H2B.W) gene is one of the testis specific histone variant genes that encodes a sperm telomere-binding protein, required for reorganization and integration of sperm chromosomes. The objective of the present s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017